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ABSTRACT 

The construct validity of four self-concept (SC) 
traits (general SC, academic SC, English SC, mathematics SC) , as 
measured by three different scales (Likert, semantic differential, 
Guttman) for low- (n-252) and high-track (n-588) Canadian high school 
students, was assessed using both the Campbell-Fiske criteria, and a 
comparison of hierarchically nested covariance structure models. 
Confirmatory factor analysis was used to model hypotheses related to 
convergent and discriminant validity and to test directly 
equivalencies of traits and methods, Findings indicate that 
assumptions of invariant construct validity cannot be taken for 
granted; differences in both the measurement and structure of SC were 
found. Academic SC, as measured by the Likert and Guttman scales, was 
problematic for the high track. These scales appeared to elicit 
different types of responses from high and low ability students. 
Tests of invar iance formally confirmed this result. Discriminant 
validity of the trait factors was also less clear for the high track, 
but this may have been a measurement problem. Method bias was clearly 
more evident for the high than for the lew track. Method bias effects 
for each scale type, as well as all but one trait correlation, were 
found to be noninvariant . A 5-page list of references and eight 
tables are included, (LPG) 
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Abstract 

The construct validity of four self-concept (SC) traits 
(general SC, academic SC, English SC f mathematics SC), as 
measured by three different measurement scales (Likert, 
semantic dlf feren tlol ♦ Guttman) for low (n » 252) and high (n « 
588) track high school students was assessed using both the 
Campbel 1-Fiske criteria, and a comparison of hierarchical ly 
nested covariance structure models. Confirmatory factor 
analysis was used to model hypotheses related to convergent and 
discriminant validity , and to test directly s equivalencies of 
traits and methods* Findings indicate that assumptions of 
invariant construct validity cannot be taken for granted; 
differences in both the measurement and struct re of SC were 
found. The study has important implications for substantive 
research that focuses on the comparison of mean differences in 
multidimensional SCs across populations, and in particular, in 
general, academic. English, and mathematics SCs across ability 
levels of high school students* 
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Multitrai t->mul timothod Analyses of Three Solf-concept Scaless 
Testing Equivalencies of Conatruct Validity Across Ability 

A wealth of self-concept (SC) reseir aas focused on mean 
differences in multidimensional SCs across ability ( see Byrne, 
1984; Wylie, 1979)* An important assumption in testing for 
these differences is (a) evidence of the construct validity of 
SC measures and constructs within each group and, (b) the 
equivalence of SC measures and constructs across groups (Cole & 
Maxwell, 1985), In substantive research, however, this 
assumption is implicit in the comparison of groups, and is 
rarely tested directly. The present study, in broad terras, 
assesses the construct validity of a multidimensional SC 
structure as measured by three different measurement scales, 
and tests the equivalencies of construct validity across two 
ability levels of high school students, 

In construct validation, a researcher seeks empirical 
evidence in F^pport of hypothesised construct relations (a) 
among facets of the same construct ( within-network relations), 
and (b) among different constructs ( between-netwerk relations)* 
These theoretical linkages represent the no mo logical network of 
an hypothesized construct (Cronbach & Meehl, 1955)* Although 
construct validation encompasses an interplay of theory 
construction, test development , and data collection (Shavelson, 
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Hubner, & Sta ' he two processes are complementary* 



instrument, tb ^ ; ean be tested. Construct validation, 

fc hen» is an ongoing process involving hypotheses that need to 
be challenged repeatedly with counterhy potheses (Anastasi, 
1986; Cronbach, 1971 i Cronbach & Meehl f 1955), 

Campbell and Fiske (1959) posited that claims of construct 
validity must be accompanied by evidence of both convergent and 
discriminant validity. As such, a measure should correlate 
highly with GLher measures to which it is theoretically linked 
(convergent /alidity), and correlate negliaibly with those that 
are theoretically unrelated (discriminant validity). To 
determine evidence of construct valiaity, they proposed that 
measures of multiple traits be assessed by multiple methods and 
that all trait-method correlations be arranged in a multitrait- 
raultimethod (MTMM) matrix. The assessment of construct validity 
then focuses on comparisons among three blocks of correlations i 
(a) scores on the same traits measured by different methods 
(menotrai t-heteromethod values i.e., convergent validity), (b) 
scores on different traits measured by the same method 
(heterotrait-monomethod values i.e., discriminant validity) 
and, (c) scores on different traits measured by different 
methods (heterotrait-heteroraethod values i,e. f discriminant 



rather than 



hat is to say, given an adequate 



theory, one 



instrumenti given an adequate 
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validity). Specific criteria guide the inspection of these 
values and arc described later. 

While the seminal work of Campbell and Fiske (1959) 
represents a major contribution to the field of psychome tries , 
researchers have noted several shortcomings in their procedure 
( see e,g, f Hubert A Baker, 1 9 78 f Kavanftgh, MacKinney , & 
Wolins, 1971; Marsh 8 Hocevar, 1983; Schmitt , 1978; Widaman, 
1985), In particular, many researchers have criticized the 
subjectivity of the criteria upon which construct validity is 
based, and have proposed alternative quantitative methodologies 
(for a review, see Schmitt & Stults, 1986), 

One methodologically more sophisticated approach to 
assessing construct validity within the MMTM framework is the 
analysis of covariance structures using the confirmatory factor 
analytic (CPA) procedure originally proposed by Joreskog 
(1971), and now commercially available to researchers through 
the LISREL VI computer program (Joreskog & Sorbom, 1985), The 
relative merits of CFA in analyzing MTMM matrices is now well 
documented (see e.g., Marsh & Hocevar, 1983; Schmitt & Stults, 
1986; Widaman, 1985), As compared with the Campbell-Fiske 
procedure, a summary of the major advantages of CFA relevant to 
the present paper are as follows; (a) the MTMM matrix is 
explained in terms of the underlying latent constructs, rather 
than the observed variables, thus obviating influences of 
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measurement error; (b) the evaluation of convergent and 
discr iminan t validities can be mode at both the matrix and 
individual parameter levels; (c) based on a series of 
hierarchically nested models, hypotheses related to convergent 
and discriminant validities can bo tested statistically, and 
(d) separate estimates of variance due to traits, methods, and 
error/uniquenesses are provided* 

The validity of SC has been examined within a MTMM design 
using both Campbell-Fiske and CFA procedures. Evidence of 
convergent and discriminant validity for both trait and method 
factors, and support for the multidimensional structure of SC 
for students in grades 5 through college have been reported 
(Marsh & O'Neill, 1984; Marsh, Parker, & Smith, 1983; Marsh, 
Smith, Barnes, & Butler, 1983), In particular, general SC, 
academic SC, English SC and mathtmatics SC, although 
correlated, could be measured as separate constructs. Other 
construct validity studies of SC measures have generally 
reported moderate evidence of convergent validity with other SC 
measures and/or external criteria, However, evidence of 
discriminant validity is inconsistent (see Byrne, 1984 for a 
review) ♦ 

The construct validity of different measurement scales has 
also been examined within a MTMM framework using both Campbell- 
Fiske and CFA procedures, Findings have been consistent in 
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reporting evidence of convergent validity for Likert, semantic 
differential! end Guttman scales (Flamer, 1 963 | Jaccard, Weber, 
& Lundmark, 1975| Kothandapani, 1971 ; Qstrom, 1969), Evidence 
of discriminant validity, however, has been inconsistent. 
Modest method bias for the Likert and Guttman scales has boon 
reported (Kothandapani, 1971), However, in a reanalysJs of the 
Ostrom and Kothandapani MTMM data using CFA, BagoEEi (1978) and 
Schmitt (1978) reported opposing conclusions regarding the 
convergent and discriminant validity findings (but see Widaman, 
1985). Finally, Flamer's CFA analysis confirmed his former 
findings, and also reported evidence of a method-trait 
interactionf Likert and semantic differential scales differed 
in the way they measured a particular trait, 

Although each of these studies used either Campbell-Fiske 
or CFA procedures to examine construct validity within a MTMM 
framework, none examined data either for a particular ability 
group (e,g,, low track), or across ability groups (e.g., low 
track vs high track), Cole and Maxwell (1985) however, have 
noted that evidence of construct validity within ona population 
in no way guarantees its equivalence across populations, As a 
case in point, Byrne and Shavelson (in press) found differences 
in the way English and mathematics SCs related to general and 
academic SGs for adolescent males and females; they also found 
significant gender differences in the reliability of certain 
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measuring instruments* Indeed, findings from substantive 
studies of academic tracking in high school suggest the 
possibility of parallel construct validity differences based on 
SC responses from low and high ability students* For example, 
low track students have been shown to have weaker reading 
comprehension skills than high track students (Addy, Henderson, 
& Knox, i98Q)* As such , their interpretation of test items on 
particular measurement scales may differ from those of their 
high track peers* Such findings would bear importantly on the 
construct validity of the measures, and the traits underlying 
them . 

The present study has three purposes* First, to assess the 
construct validity of four SC traits (general SC. academic SC, 
English SC, mathematics SC) as measured by three different 
measurement scales (Likert, semantic differential, Guttman), 
for low and high track students* Second, to compare construct 
validity findings based on two different encroaches to 
analyzing MTMM matrices Campbel 1-Fiske criteria and 
confirmatory factor analysis* Finally, to test directly, the 
equivalencies of SC measurements and structure across academic 
high school tracks* 
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Method 



Sample and Procedure 

The original sample comprised 988 (324 low track, 664 high 
track) grades 11 and 12 students from two suburban high schools 
in Ottawa, Canada* Following listwise deletion of missing data, 
the final sample size was 840 (252 low track, 588 high track), 
The data approximated a normal distribution with skewness 
ranging from -1*19 to .19 (X m -.27) for low-track, and from 
-1*26 to ,10 (X e - P 50) for high-track students; kurtosis 
ranged from -,53 to 1.60 (X m ,23) for the low track, and from 
-.92 to 1,83 (I - .27) for the high track, Since English is 
part of the core curriculum for high schools in Ontario (i,e, 
compulsory), it was known that all students were enrolled in at 
least one English course, and therefore, only mathematics 
classes were tested for the study, 

In the province of Ontario, tracking in high school is 
applicable only to the core curricula. For each academic 
subject (e.g. mathematics, science, history, geography, 
English, French), two courses are structured! one designed to 
meet the needs of high ability students (advanced level 
courses) and the other, low ability students (general level 
courses). General level courses are considered tf appropriate 
preparation for employment or further education in colleges and 
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other non-universi ty educational insti tu tions' 1 (Ontario 
Ministry of Education, 1979-81, p. 7), Although a definition of 
high and low academic tracks has not been formalized by the 
Ontario Ministry of Education, most Ontario secondary schools 
in general (King 8 Hughes, 1985), and the participating schools 
in the present study in particular, classify low-track students 
as those taking two or more of their mathematics and science 
courses in any given year, at the general level; all other 
students are considered high-track, 

A battery of SC instruments (described below) were 
administered to intact classroom groups during one 50-minute 
period. The testing was completed approximately two weeks 
following the April report cards* The students therefore had 
the opportunity of being fully cognisant of their academic 
performance prior to completing the tests for the stydy. This 
factor was considered important in the measurement of academic 
and subject specific SCs* 
Instrumentation 

The SC test battery consisted of 12 instruments! three 
measures for each of general SC, academic SC, English SC, and 
mathematics SC. All instruments were self-report rating scale 
formats and were designed for use with a high school 
population* They were selected because they purported to 
measure (with some justification) the SC facets in the theory 



Multitrait-multimethod Analyses 

II 



to be tasted. 

Likert scale , The Self Description Questionnaire III (SDQ; 

Harsh S O'Neill, 1984) is structured on an 8-pnint likert- type 
scale with responses ranging from "l-Def initely False" to "8- 
Definitely True"* The General-Self aubscalo contains twelve 
items and was used to measure general SC. Academic SC» English 
SC t and mathematics SC were measured by the Academic SC, Verbal 
SC* and Mathematics SC subscales, respectively; each contained 
10 items. Internal consistency reliability coefficients ranging 
from ,86 to ,93 (Md am .90) for each of these subscales, and 
strong support for their construct validity based on inter- 
pretations consistent with the Shavelson et al. (1976) model of 
SC have been reported (Byrne & Shaveleon, 1986; Marsh & 
O'Neill, 1984), 

Semantic differential scale . The Affective Perception 
Inventory (API; Soiree & Soares, 1979) is a semantic dif- 
ferential scale with a forced-choice format containing four 
categories maintained along a continuum between two dichotomous 
terms (e,g, "happy", "unhappy"), The Self Concept, Student 
Self, English Perceptions, and Mathematics Perceptions 
subscales were used to measure general SC, academic SC f English 
SC, and Mathematics SC, respectively, The number of items 
comprising each of the API subscales is as follows! Self 
Concept 25 1 Student Self 25; English Perceptions 22; 
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Mathematics Perceptione 17* Internal consistency coefficients 
ranging from ,79 to .95 (Md ■ ,85) have been reported for these 
subscales (Byrne & Shavelaon, 1 986 | Soares & Soares, 1980), 
Convergent validity coefficients ranging from ,49 to #55 (Md r 
« *50 with peer ratings, and from ,57 to ,74 (Md r «* 48*5) with 
teoeher ratings for the same subacol es 9 on well an evidence of 
discriminant validity, have also been reported (Soares & 
Soares , 1980) , 

Guttman scales * The Self-esteem Scale (SE5; Rosenberg, 
1965) is a 10-item Guttman scale baaed on a 4-point format 
ranging from "strongly agree 1 ' to "strongly disagree; it was 
used to measure general SC. A test- retest reliability of ,62 
(Byrne, 1983), and an internal consistency reliability coef- 
ficient of ,87 (Byrne & Shavelson, 1986) have been reported, as 
wall as convergent validities ranging from ,56 to ,67 (see 
Byrne, 1983), The Britain Self Concept of Ability Scale (SCAS; 
Brooknver, 1962) also a Guttman scale, has a response format 
based on a 5-point format* Respondents are asked to rank their 
ability in comparison with others, on a scale from "1-1 am the 
poorest" to "5-1 am the best", Form A was used to measure 
academic SC, Forms B and C were used to measure English SC and 
mathematics SC, respectively, Items on Forms B and C are 
identical to those on Form A, except that they elicit responses 
relative to specific academic content (e,g* "how do you rate 
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your ability in English (mathematics) compared to your close 
friends 1 ? 11 )* Test-retest and internal consistency reliability 
coefficients ranging from .69 to *72* and from ,77 to ,94* 
respectively* have been reported (see Byrno » 1 983 f Byrne & 
Shavelson * 1986) , 
An fl 1 y e t a of the Dot n 

Responses to negatively worded items were reversed so that 
for all instruments* the highest response code was indicative 
of a positive rating of SC* Additionally* the first item on the 
API Self Concept suhseale ("1 am masculine""! am feminine 11 ) 
was receded, so that it was contingent on gender, 

The data were analyzed in three stages* First* zero-order 
correlations among all measures were arranged in a MTMM matrix, 
and then examined separately for evidence of construct validity 
based on the Campbell-Fiske criteria, for each track, Second* 
using CFA procedures* a 7-faeter model of the data comprising 
four trait factors (general SC* academic SC* English SC* 
mathematics SC) and three method factors (Likert* semantic 
differential * Guttman scales) was proposed and tested 
separately for each track* A schematic representation of this 
model is presented in Figure 1 # Finally* equivalencies of SC 
measurements and structure were tested across track* 
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Insert Figure 1 about here 

Campbell -Flake Criteria , Campbell and Flake (1959) proposed 
four criteria for evaluating convergent and discriminant 
validity, These criteria are! 

1* The convergent validities should be significantly 
different from zero and sufficiently large to warrant further 
investigation of validity* 

2. The convergent validities should be higher than 
correlations between different traits assessed by different 
methods (heterotrait-heteromethod blocks). 

3* The convergent validities should be higher than 
correlations between different traits assessed by the same 
method (heter©trait-*m©n©method blocks) * 

4, The pattern of correlations between different traits 
should be the same in both the heteromethod and monome t hod 
blocks * 

For each track f comparisons of various blocks of 
correlations involved determining the proportion of times that 
these criteria were satisfied, 

Confirmatory Factor Analysis * For each track, a 7-factor 
model comprising four traits and three methods was hypothesised 
and tested for convergent and discriminant validity by means of 
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(a) comparisons with alternatively specified models, and (b) 
examination of individual parameter estimates. All CFA analyses 
were conducted using LISREL VI (Joreskog & Sorbom, 1985), 

Traditionally , in eovariance structure analysis, the extent 
to which a proposed model fits the observed data has been baaed 

2 

on the ^ likelihood ratio test. However, problems related to 

2 

the dependency of x on sample size have been noted (see e*g, 
Bentler & Bonett, 1980), Thus, in addition to the statistical 
fit of a model, a measure of its practical fit must also be 
considered (Widaman, 1985)* To this aim, Bentler and Bonett 
proposed a normed index of fit (delta) that ranges from 0,0 to 
1,0, Joreskog (Joreskog, 1 97 1 i Joreskog & Sorbom, 1985), among 
others, have posited that assessment of model fit should be 
based on multiple criteria. This was accomplished in the 

2 2 

present study by using (a) the x, likelihood ratio, (b) the X 
/degrees of freedom (df) ratio, (c) the delta inde> ? (d) 
T-values and modification indices provided by the LISREL VI 
program* and (e) knowledge of substantive and theoretical 
research in this area, 

To establish various validity criteria, the proposed 

7-faetor model was tested against a series of more restrictive 

models in which specific parameters were either eliminated or 

2 A 2 

constrained to equal zero. Since the difference in y ( ) is 
itself ^^^^distributed , with degrees of freedom equal to the 
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difference in degrees of freedom for the two models, the fit 
differential between comparison models can be tested statist- 

2 

leally* A significant Ay argues for the superiority of the 
lees restrictive model* Additionally, the difference in 
practical fit can be noted, (see Widaman. 1985, for a more 
detailed discussion of these model comparisons)* 

The parameter estimates for trait and method factor 
loadings, trait intercorrelations , method in tercor relations , 
and estimated error uniquenesses were examined with respect to 
magnitude and statistical signif icance t the latter being 
determined by the z-ra fc io (parameter estimate/standard error) 
which is printed as a T-value by LISREL VI # T-values >2.00 are 
considered statistically significant at the *05 level (Joreskog 
& Sorbom, 1985), 

Tests of Invarlance * Testing for the equivalency of traits 
and methods involved the comparison of a series of models in 
which certain parameters were constrained to be equal across 
track, with less restrictive models in which these parameters 

2 

were free to take on any value, The difference in x t as 
described above, was used to determine the statistical 
significance of the hypotheses tested. 

Results 

Construct Validity Based on Campbell-Fiske Criteria 
The matrices of 2ero-order correlations, computed 
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separately for each track, are presented In Table l f together 
with the means, standard deviations! and internal consistency 
alpha reliabilities. Results are entered below the main 
diagonal for the low track, and above the main diagonal for the 
high track* 



Insert Table 1 about here 

Criterion 1 , Convergent validities were all statistically 
significant <,05) for both the low track (Md £ - ,60) and 
the high track (Md £ s s 69) ¥ Convergent validity for English SC 
as measured by the Llkert and Guttman scales, however, was only 
moderate, even with findings of higher validity for the high 
track (low track, £ m ,43; high track, £ « ,56), 

Criterion 2 , Convergent validities were consistently higher 
than correlations between different traits assessed by 
different methods ( het ero t rai t-heterome thod triangles) for both 
the low track (36 of 36 comparisons) and the high track ( 35 of 
36 comparisons). 

Criterion 3 , Convergent validities were for the most part, 
consistently higher than correlations between different traits 
measured by the same method (heterotrai t-monomethod triangles) 
for both the low track (14 of 18 comparisons) and the high 
track (15 of 18 comparisons), In partieulari the semantic 
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differential and Guttman scales both exhibited some method 
biaa; this effect, however, was stronger for the Guttman 
scales # 

Criterion 4 . For both tracks, the pattern of correlations 
among the different traits was fairly similar across methods; 
three correlations derived from the semantic differential and 
Guttman measures were differentially disproportionate across 
track , 

Constr uct Validity Baaed on Confirmatory Factor Analyses 
Goodness-of-f i t indices for the series of MTMM models 
tested are presented in Tables 2 and 3 for the low and high 
tracks, respectively* Model 1 is the most restrictive model, 
representing the null hypothesis that each observed measure is 
an independent factor; it serves as the null model against 
which competing models are compared in order to determine the 
delta index* Models 2*-4 represent deereasingly restrictive 
models, such that Model 4 is the least restrictive, having both 
correlated traits and correlated methods; it serves as the 
baseline model since it represents hypothesized relations among 
the traits and methods and, typically, demonstrates the best 
fit to the data, 2 



Insert Tables 2 and 3 about here 
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Although for both tracks Model 4 represented the beet fit 
to the data, the fit, baaed on statistical criteria, was not 
good, Thiii lack of fit indicated some degree of misspecif i = 
cation in the model (see Kaplan, 1 987 ) i it was expected that 
the subsequent analyses would identify possible areas of 
misspeeif ication. Due to problems of estimation, as well as 
other considerations (see Widaman, 1985), additional fitting of 
the hypothesised model was not conducted* Model 4, then, 
indicated that both the trait and method factors were 
correlated. These correlations for the low track, however, were 
extremely weak, as indicated by the small difference, albeit 

2 

significant (jmC,Q5), in statistical ( A X « 9*48) and practical 
(je_/df a 0,0; delta m ,02) fit criteria between Model 4 and 
Model 3 in which the methods were uneorrelated , These results 
suggest that for the low track, the three measurement scales 
were operating independently, 

Evidence of convergent validity was tested by comparing 
Model 4 with Model 5 in which no trait factors were specified. 
As shown in Table 4, the . was highly significant for both 
tracks, thus providing strong evidence of convergent validity 
for the trait factors* Since complete discriminant validity of 
traits argues for Eero intereorrelat ions > evidence of same can 
be tested by comparing the baseline model (Model 4) with one in 
which perfect correlations among tn its are hypothesized (Model 
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6). The results in Table 4 indicate that for both tracks, 
discriminant validity of the traits was evident as indicated by 

the highly significant differences in Finally, the 

discriminant validity of method factors (i,e, no method bias) 
was tested by comparing Model 4 with Model 2 In which no method 
factors were specified. Again, lor both tracks, thie comparison 
yielded statistically significant 'e, suggesting fairly 

strong evidence of method bias effects. 

Insert Table 4 about here 



To determine the extent to which each measurement scale was 
contributing to the method bias, Model 4 was further compared 
with three additional models, each of which eliminated one of 
the three methods. With one exception, each of the comparisons 
indicated significant method effects; those associated with the 
semantic differential, for the low track, were not significant* 
The results in Table 4 demonstrate that while the Likert 
measures made the heaviest contribution to method bias for the 
low track, the Guttman measures were more important for the 
high track* Scales contributing the least to method bias were 
the semantic differential for the low track, and the Likert for 
the high track. 

More precise assessments of trait- and method-related 
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variance con be ascertained by examining the individual 
parameter estimates as specified for Model 4, These results are 
presented in Tables 5 and 6 for the low and high tracks, 
respectively. The magnitude of the trait loadings for both 
tracks are shown to bo generally consistent with the earlier 
convergent validity findings (see Table 4)i all loadings for 
the low track, and all but one for the high track were 
significant. With the exception of academic SU, as measured by 
the Likert and Guttman scales for the high track, each trait 
factor was well defined by the hy potheriiEed model. 



Insert Tables 5 and 6 about here 

Method factor loadings, overall, tended to be larger for 
the high, than for the low track, Method-related variance for 
the high track was substantial for all but three measurements; 
all parameter estimates were statistically significant* In 
contrast, only seven of the 12 method parameters were 
significant for the low track. Interestingly, the measurement 
of general SC was associated with a modest degree of method 
effects for each of the scales, 

Discriminant validity of traits and methods are determined 
by examining the factor correlation matrices. Results generally 
supported earlier findings from the overall measures of 
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goodness=of-f i t (see Tf ble 4)> However, evidence of trait 
discriminant validity for the high track was less clear than 
for the low track, Marsh and Hocevar (1983) noted that only 
when correlations are extreme (i.e., approach unity) should 
researchers ha concerned about a lack of discriminant validity* 
As such* claims of discriminant validity of the traits appears 
justified for both tracks, However g Marsh and Hocevar also 
argued for trait correlations consistent with the underlying 
theory. This is not the ease for the high track; trait 
correlations are not totally consistent with SC theory 
involving these particular traits, In particular, correlations 
between academic SC and mathematics SC, and between English SC 
and mathematics SC, typically, yield values of approximately 
,50 and ,01, respectively (see e,g,» Byrne & Shavelson, 1986; 
Marsh & Shavelson, 1985), As such, discriminant validity of the 
traits for the high track cannot be clearly interpreted on the 
basis of these findings. 

Lack of discriminant validity among method factors was 
clearly more evident for the high, than for the low track. 
These findings suggest that whereas, for the most part, each 
measurement scale operated independently for the low track, 
this was not so for the high track; a higher degree of method 
bias was evident. 
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Testa .of._Invoria.n.ce. 

In tasting for invariance, the parameters were estimated 

simultaneously for each track. The first step was to test the 

assumption of overall invariance across ability (i.e* f is 

there, or is there not, a difference in the low and high track 

varanee-covarlance matrices?), Since this assumption was 

rejected ( X 2 » 199.64, jK.QQl), hypotheses related to the 
78 

invariance of traits and methods across ability were formally 
tested by comparing a series of increasingly restrictive 
models. Reeults from tests for the invariance of SC 
measurements and structure are presented in Tables 7 and 8 
respectively , 



Insert Table 7 about here 



The simultaneous 4-faetor solution for each group yielded a 

2 

reasonable fit to the data (^/df 3*79)* These results suggest 
that for both tracks, the data were fairly well described by 
the general, academic, English, and mathematics SC factors. 
Thus, a series of models were tested by comparing one in which 
certain parameters were constrained to be equal across track, 
against one in which these parameters were free to take on any 
value. For example, the hypothesis of an invariant pattern of 
trait loadings was tested by constraining these parameters to 
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be equal across track, and then comparing this model (Model 2) 
with Model l f in which only the number of factors was held 

2 2 

invariant. Since the difference in ^ was significant ( A X * 
239,09)* this hypothesis was considered untenable. Similarly, 
the hypothesis of an invariant pattern of general SG loadings 
was tested, but found tenable, 

2 

Given findings of a nonsignificant Ax , specified factor 
loading parameters were held cumulatively invariant, thus 
providing an extremely powerful test of factorial invariance* 
Space limitations preclude further elaboration of the 
invariance testing procedures. However, detailed elsewhere, are 
descriptions of the procedure in general (e*g., Joreskog, 
1971), and an application similar to the present one, in 
particular (Byrne & Shavelson, in press), 

Insert Table 8 about here 

Overall, the results indicate that whereas all measures of 
general SC and English SC were invariant across track, this was 
not so for academic and mathematics SCs. Academic SC, as 
measured by the SDQ III and SCAS, differed for the two groups, 
Likwise, the API measurement of mathematics SC was not 
consistent across track. Each of the method factors and, all 
but one trait correlation, were found to differ significantly 
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across trackj the correlation between general and academic SC 
was equivalent. 

Summary and Discussion 
The construct validity of four SC traits (general SC* 
academic SC. English SC f mathematics SC) as moa&urcd h y thrf^P 
different measurement scales (Likert, semantic differential, 
Guttman) for low and high track students was assessed using 
both the Campbeli-Fiske criteria and CFA procedures* The 
results from both analyses, in general, supported fairly strong 
evidence of convergent validity and evidence of method bias for 
both groups* CFA procedures, including tests of the invariance 
of traits and methods across tracks, provided a more detailed 
insight into the group-specific aspects of these findings. 

Overall, construct validity findings yielded four major 
differences between low- and high-track students. First, 
academic 3C f as measured by the Likert and Guttman scales, was 
problematic for the high track, Belatedly, the strongest method 
loadings were associated with these same measures, It appears 
that items on the Likert and Guttman scales measuring academic 
SC elicited different types of responses from high and low 
ability students, Quite possibly, different perceptions of 
academic SC by the two groups of students bear importantly on 
the problems of model miaspecif icat ion noted earlier, 

Second, discriminant validity of the trait factors was less 
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clear for the high, than for the low track, However, thie 
finding may, in fact, be a measurement, not a structural 
problem. The fact that the Likert and Guttman scales were in 
some way measuring academic SC differently from the semantic 
differential scale for the high track, indicates a trait-method 
interaction effect and Jikely contributes to the poor discrim- 
ination among the trait factors* 

Third* method bias was clearly more evident for the high, 
than for the low track. The largo method in tercor relations 
indicate that responses by high ability students to items 
measuring a particular trait would be similar, regardless of 
which of the three scaling formats were used. In other words, 
given a particular score on general SC as measured by the 
Likert scale say, high track students would be equally likely 
to obtain a similar score on either the semantic differential 
or Guttmon scales. When the Impact of each method factor was 
examined separately, these effects differed across track* 
Whereas the Likert scales contributed the most method bias to 
scores by the low track, the Guttman scales contributed the 
most for the high track. Contributing the least to method bias 
were the semantic differential and Likert i-^dles for the low 
and high tracks, respectively, However, these results, 
particularly with respect to the Likert scale, are not 
consistent with earlier findings based on the Campbell-Fiske 
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criteria* 

Finally, testa of invariance formally tested 9 and 
confirmed, earlier findings that the Likert and Gyttman scales 
differed in the measurement of academic SC across abilities! 
this was also found to be so for mathematics 50, as measured by 
the semantic differential gcale. Furthermore, method bias 
effects for each scale type* as well as all but one trait 
correlation, were found to be noninvarian t , 

Taken together f the findings from this study demonstrate 
that assumptions of equivalent construct validity across groups 
cannot be taken for granted. Differences were found with 
respect to both the measurement and structure of SC, These 
results yield important implications for substantive research 
focusing on mean differences in multidimensional SCs across 
populations, and in particular, in measurements of general, 
academic, English, and mathematics SCs across ability levels of 
high school students* 
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Footnotes 

2 

1* A , ,x /df ratio ranging from 1,00 to 5*00 (Wheaton, Muthen, 
Aiwin, & Summers, 1977), and a delta index >,90 (Bentler & 
Bonett* 1980) are considered a reasonable fit to the data. 
2, For reasons related to identification and estimation 
problems, trai t-method factors were fined to zero for all 
analyses (see Sehmitt & Stults, 1986; Widaman, 1985), 
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Table 1 

Multltralt-wultlWBthod Matrix of Zero-order Correlations Among Sol f-eoncept 
Mens urea for Low and High Tracks 11 



Measure GSC ASC 



Llkert 
(SDQII1) 



ESC MSC 



Senantlc Differential 

- JMU 

GSC ASC ESC MSC 



Guttwan 
(SCAS) 



GSC ASC BSC 



MSC 



Likert 

GSC 
ASC 
ESC 
MSC 



^33 


.28 


.17 


H fl6- 






.43 




/is- 




^01 


2.22 

L 2 ° 


.35 







,06 
33 



.16 
L 42 



"Tel 

.4Si 



i .30 

^.23 



66- 
50 



.11 
•*42 



.50 -.OT^^fBt 



.131 
.40 1 
^02 | 



Semantic Differential 



GSC 
ASC 
ESC 
MSC 

Quttnan 



<4t - - io ~ " 



j_.26 .39 .65*-. 





^62 


.20 


.27 


.55** 




*^42 


.41 


.18 


.47*** 




^07 


.26 


.42 


.IS*"*" 





i.sr 



J.5r-^.54"^ M 
J.21 ik* 

|.27 



^27 

,3B*-« 
52 



,12 
,,34 
,_7*flr* 
,0? ** 



.20; 

.35 | 
«e^01 I 

^er 1 



GSC(SES) 
ASC 
ESC 
MSC 



41- 

.2*7-. 

.24 

.24 



-.26 .27 .26 
**„,25 . 23 I 
. 3*7"*^4|-^pi j 
.35 



rfig **«*46 .11 

1.26 .41*".^^* 
!.21 .37 .of*- 



".u\ 
.35 j 





^31 


.15 


.19 


.2?"**" 




^54 


.61 


.25 


.51*** 




^09 


.22 


.45 


.07*^ 





Low Track 

M 76.00 49.58 54.92 41,89 76,88 70, 39 57,83 44*88 31,18 24.80 25.33 23,02 

SO 13,40 12,40 9,45 13.37 9,07 8,84 10.62 10,61 4.84 4 47 4,84 5,82 

2 -91 86 ,73 ,87 ,83 ,82 .87 ,94 ,85 ,79 .84 .89 

High Track 

H 75.71 57,77 57,47 49,00 76,76 73,72 61,75 47,24 31,45 30,26 28,90 26,25 

SO 14.58 11,78 9,93 16.92 9.44 9. S9 11.21 11.64 5,07 4.94 5,73 7,97 

a ,94 .89 ,81 ,94 ,88 ,85 ,89 ,95 .88 .86 ,90 ,95 



a 



Correlations for low track are below the main diagonal, and for high track above 
main diagonal. 



Notei The underlined values are convergent validities. The values In solid 

triangles are discriminant validities (heterotrait-menomethod correlations) t 
those In broken triangles are discriminant validities (heterotrait- 
heteromethod correlations). 

All correlations > .11 are significant (p<.05) a - alpha reliability 
coefficient; GSC » general self-concept (SC); ASC * academic SC; ESC = 
English SC; MSC mathematics SC; SDQ III - Self Description Questionnaire 
',. • > '--IU § API • Affective Perception Inventory; SCAS - SC of Ability Scale. 
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Table a 

Goodness-of-fit Indice s for Hu l tltrait-multliiethad Mod els 
Low Track (n*252) 



Model x f df xVdf delta 



1, 12 uncorrelated factors 1681,06 86 25,4? 
(null model) 

a* 4 correlated trait factors 216,26 48 4,51 ,871 
no Method factors 

3. 4 correlated trait factors 114*69 36 3.19 ,914 
3 uncorrelated method factors 

4. 4 correlated trait factors 105,31 33 3,19 ,937 
3 correlated nethod factors 

(baseline Model) 

5. no trait factors 868.09 51 17,02 ,484 
3 correlated Method factors 

6* 4 perfectly correlated trait 403.61 39 10.35 ,760 
factors, freely correlated 
Method factors 

7, 4 correlated trait factors 154,14 39 3,95 ,908 
2 correlated Method factors 

(semantic differential, Guttman) 

8, 4 correlated trait factors 110.73 30 2,83 ,932 
2 correlated method factors 

(Llkert, Quttman) 

9, 4 correlated trait factors 133.00 39 3.41 ,921 
2 correlated Method factors 

(Llkert, semantic differential) 
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Table 3 

.OopdneBB-of.-f.it Indices for MultitraJ t-multiftiethod Modola 
High Track (n^588) 



Mode l X f df x g /df delta 



12 uncorrected factors 0400,71 66 83,04 

(null model) 

4 correlated trait factors 642.79 48 13,39 ,883 
no method factors 

4 correlated trait factors 302,70 36 8,41 ,944 

3 uneorrelated method factors 

4 correlated trait factors 188,98 33 5,64 .966 
3 correlated method factors 

(baseline model) 

no trait factors 3114,75 51 61,07 ,43a 

3 correlated method factors 

4 perfectly correlated trait 1484,21 39 38,06 ,729 
factors, freely correlated 

method factors 



4 correlated trait factors 310,09 40 a 7,75 
2 correlated method factors 
(semantic differential, Guttman) 

4 correlated trait factors 338,44 40 a 8,46 
2 correlated method factors 
(Likert, Guttman) 

4 correlated trait factors 463,12 40 a 11.38 
2 correlated method factors 
(Likert, semantic differential) 



8 943 



938 



,915 



To offset the estimation of a Haywood case, the error variance 
of the self -concept of Ability Scale Form A was fixed to ,01; 
this accounted for the extra degi ae of freedom. 
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Table 4 

Goodness-of-fit Indices for Comparison of Multltrait-aultlftethod Models 11 



Low Track High Track 

Differences in Differences in 

Model Comparison df x p /df delta x* df xVdf deltt 



Teats of Added Components 

Model 1 vs Model 2 1464,79 18 20,96 ~ 4637,92 18 69,65 

Model 2 vs Model 5 101,57 12 .96 ,04 340,09 12 4,98 .06 

Model 3 vs Model 4 9,48* 3 0,00 ,02 116.73 3 2,77 02 

Test of Convergent Validity 

Model 4 vs Model 5 762,88 18 13.83 ,45 2928.77 18 55,43 ,53 

(traits) 

Tests of Discriminant Validity 

Model 4 vs Model 6 298,40 6 7,16 ,18 1298,23 6 32.42 .24 
(traits) 

Model 4 vs Model 2 111.05 15 1,32 ,07 458,81 15 7,75 .08 
(Methods) 

Tests of Method Bias 

Model 4 vs Model 7 48,93 6 ,76 ,03 124.11 7 2 11 02 

(Llkert) 

Model 4 vs Model 8 5,52 b 6 ,36 ,00 152,46 7 2,82 ,03 

(semantic differential) 

Model 4 \-3 Model 9 27,79 6 ,22 ,02 277,14 7 5,94 ,05 

(Outturn? 



* p<.05 

a 

unastei sked x 1 difference values are statistically significant at p<,001 

b 

not statistically significant 
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Table 5 

Lag X w^T)t^frj^Qr/\)f\ j_guoneLS^Load jjij0_;_.,.flnd /gctPr^CQrrelBt ion«5_ for ^Q9j|.lj n .p Modal "-Lpw_ Track' 



:_ . . ?raj_t__ , _ jj^tho_d^____ __ _______ Error/ 

Measure 12 3 4 I II III Uniqueness 



Likert 



general SC .89*(,05) .0 ,0 ,0 ,07 (,07) ,0 ,0 5 20*(,05) 

academic SC .0 ,73*(,06) ,0 .0 ,3J*(.ll) ,0 ,0 ,37*{.07) 

English SC ,0 .0 ,78*(,07) ,0 ,41*(.16) ,0 ,0 ,23 (.16) 

mathematics SC ,0 ,0 ,0 «87*(,0S) ,06 {,06} ,0 s 0 ,24*{.03) 



Semantic Differential 



general SC ,S7*(,Q6) .0 .0 ,0 ,0 .46*(.16) .0 ,32*(.15) 

academic SC ,0 .77*(,06) ,0 .0 ,0 ,43*(,1S) ,0 ,21 (.11) 

Inglish SC ,0 s 0 .7B*( s 0e5 ,0 ,0 ,12 (,07) ,0 ,37*(,06) 

mathematics SO .0 ,0 .0 .68*{,05) ,0 ,05 (.08) ,0 ,21*{.03) 



duttman 



general SC .84*(,06) ,0 ,0 ,0 ,0 ,0 ,01 (.05) ,30*(,O5) 

academic SC .0 ,65*(,08) .0 .0 .0 ,0 ,73*(,13) ,04 (.17) 

English SC .0 ,0 ,£3*(.0S) .0 ,0 ,0 .27*(.07) .53*(,06) 

mathematics SC .0 ,0 ,0 .84*(,0S) *0 ,0 ,24*{.06) .2$*(,04) 



Factor Correlations 



Trait 1 1,0 

Trait 2 ,59 (.06)1,0 

Trait 3 ,33*(,07) ,72*(, 05)1,0 

Trait 4 ,34*(,06) .52*(.06) .08 (.07)1.0 

Method I ,0 ,0 .0 .0 1,0 

Method II ,0 .0 .0 .0 ,11 (,17) 1.0 

Method III ,0 ,0 ,0 «0 .39*(.13) .03 (.12) 1.0 



All values of 1*0 and .0 are Haed values. All parameter estimates differing significantly 
from zero are asterisked. Parenthesised values are standard errors of associated parameters, 
SC ' ■ self -concept 
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Table a 

Pastor and Irror/Uniquenesa Loadings, and Faster Correlations for Baselin e Model-High TracK 



Trait Mat Hod Error/ 

M * flaurt 1 * 3 4 I II III Uniqueness 

UNert 

general SC .88*(.04) ,0 ,Q ,0 .19*(.05) ,0 ,0 .18*(<Q2) 

academic SC .0 .29*(.07) ,0 ,0 ,70*(.04) 4 0 ,0 ,33*(,03) 

English SO ,0 ,0 ,66*(,04) .0 .46*{,04) ,0 ,0 .39*(.03) 

mathematics SC ,0 ,0 ,0 ,78* (,03) ,§!*(. 04) s 0 ,0 .07*{.01) 

Semantic Differential 

general SC .71*(,04) ,0 ,0 ,0 ,0 s 27*{,05) .0 .40*^03) 

academic SO .0 >83*(,i0) ,0 ,0 ,0 ,54*(.08) ,0 ,01 (,12) 

English SO .0 ,0 ,82*(,04) ,0 ,0 ,53*{,0S) ,0 ,08*(,03) 

mathematics SC ,0 ,0 ,0 .72* (.03) ,0 ,59*(,04) ,0 ,07*{.02) 

Gut t man 

general SC ,88*(,04) ,0 ,0 ,0 ,0 ,0 .24*(.05) ,2O*(,02) 

academic SO ,0 ,14(.07) f 0 ,0 ,0 .0 ,97*{\o4) ,04 (.03) 

English SC ,0 ,0 ,62*(.03) ,0 ,0 ,0 ,S9*(,04) ,33*(.03) 

mathematics SC .0 ,0 ,0 ,68*(,03) .0 ,p .S5*{.04) ,17*(.01) 



Factor Correlations 



Trait 1 


1.0 






Trait 2 


,63*(,07)i,g 






Trait 3 


,U*(,06) ,20*(, 07)1,0 






Trait 4 


,10 (,00) ,08 (.08K46*( 


.05)1,0 




Method X 


,0 ,0 ,0 


,0 


1,0 


Method II 


>0 ,0 .0 




,89*(, 02)1,0 


Method HI 


.0 ,0 ,o 


,0 


,86*(,02) ,78*(. 03)1.0 



All values ©f 1,0 and ,0 are fi^ed values. All parameter estimates differing significantly 
from ^er© are asterisked. Parenthesized values are standard errors of associated parameters. 
SO * self -concept 
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Tahlt 7 

gjmuHanfimjg /Tests for tho Invariancc! of Trait and :._Mothod Factor 
Loadl rigs Acro$fi Track 



Competing Models x* df Ax* Adf 



Ira It i 

1, Four SC faotorft invariant* 311,41 82 

2, Model 1 with all SC loadings 580.23 94 239.09*** 12 
invariant 

3, Model 1 with all general 312,23 86 1,09 3 
SC loadings invariant 

4, Model 1 with all general 456,87 88 145*53*** 6 
and academic SC loadings 

Invariant 

5* Model 1 with all general 318,13 88 8,99 8 

and English SC loadings 
invariant 

6, Model 1 with all general, 334*68 91 33.54** 9 
English, and Mathematics 

SC invariant 

7, Model 5 with SDQASC 401,89 89 83,76*** 1 
invariant 

8, Model 5 With APIASC 318.19 89 .06 I 
invariant 

9, Model 8 with SCAASC 462*32 90 144,19*** 2 
invariant 

10, Model 8 with SDQMSC 321.67 90 3,54 2 

invariant 



(table continues) 
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Node] 



X* 



Adf 



11, Model 10 with APIMSC 
Invariant 

12. Model 10 with SCAASC 
Invariant 



332,82 01 14,60** 3 



321,79 91 



3,60 



Mot hods 

1, Model 12 with Llkert 
«ethod factor Invariant 

2* Model 12 with semantic 
differential factor 
Invariant 

3. Model 12 with Guttman 
factor Invariant 



086.04 03 267.15*** 2 
468,17 03 146,38*** 2 

426,14 94 104,35*** 3 



** p < *01 
*** p < f 0Ol 

a 

Baseline Models with nonsignificant parameters fixed to 0,0 

SC • self^cencept; SDQASC * Self Description Questionnaire III 
(SDQ11I) Academic SC subsoalei APIASC « Affective Perception 
Inventory (API) Student Self aubscale; SCAASC * Self -concept of 
Ability Scale (SCAS) Por« At SDQMSC * SDQ III Mathematics SC 
aubscale; APIMSC * API Mathematics Perceptions aubscale; SCAMSC 
» SCAS Porn C 



Muiti trttl t-mul timothod Analyiieft 
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Table 8 

TctstB far the Inyarlanco of Trait Correlationn 



Competing Models x g df Ax* Adf 



Traits 



Invariant ■easureaont nodol 8 


321 ,79 


91 








Model 1 with all trait 


400,80 


95 


167 


,81*** 


4 


correlations invariant 












Model 1 with trait correlations 










made independently invariant 












a) general /academic SC 


321,88 


92 




,06 


1 


b) general /Engl Ian SC 


339.84 


92 


18 


05*** 


1 


c) ganeral/Rathenatics SC 


344.77 


92 


22 


98*** 


1 


d) academic/English SC 


397.28 


92 


73 


49*** 


1 


e) academic/mathematics SC 


393.69 


92 


71 


90*** 


1 


f) English/Bathematlca SC 


359.06 


91 









*** p < .001 
a 

Model 12 in Table 7 
SC - self -concept 



Mul t i troi t-mult i method Analyses 

A3 

Figure Ception 
Figure l t Mul ti trait-mult imet hod Model of Data 

M m method 
T m trait 

LIK m Likert scale 

SD s semantic differential scale 

GUTT m Guttman scale 

GSC « general self-concept 

ASC m academic self-concept 

ESC * English self-concept 

MSG « mathematics self-concept 
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